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Recent results are reviewed on both the time evolution and retrieval properties 
of multi-state neural networks that are based upon spin-glass models. In partic- 
ular, the properties of models with neuron states having Q-Ising symmetry are 
discussed for various architectures. The main common features and differences are 
highlighted. 

1 Introduction 

Artificial neural networks have been widely applied to memorize and retrieve in- 
formation. During the last number of years there has been considerable interest 
in neural networks with multistate neurons (see, e.g., (T|-f3l and references cited 
therein). Basically, such models can function as associative memories for grey- 
toned or coloured patterns |4l, 13 and/or allow for a more complicated internal 
structure of the retrieval process, e.g., a distinction between the exact location and 
the details of a picture in pattern recognition and the analogous problem in the 
framework of cognitive neuroscience |6|, a combination of information retrieval 
based on skills and based on specific facts or data ItI. lISl. 

In analogy with the well-known Hopfield model O, fTOl the models we discuss 
here are built from spin-glasses (see 1 1 1 1 for the Hopfield model) with couplings 
defined in terms of embedded patterns through a learning rule. Since one of the 
aims of these networks is to find back the embedded patterns as attractors of the 
retrieval process, they are also interesting from the point of view of dynamical 
systems. 

Different types of multi-state spins (=neurons) can be distinguished according 
to the symmetry of the interactions between the different states. The states of the 
Q-Ising neuron can be represented by scalars, and the interaction between two 
neurons can then be written as a function of the product of these scalars. So the 
Q- states of the neuron can be ordered like a ladder between a minimum and a 
maximum value, usually taken to be —1 and +1. Special cases are Q = 2, i.e., the 
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Hopfield model and Q = oo, i.e., the analogue or graded response neuron. The 
states of the phasor or clock neuron can be represented by vectors in the complex 
plane that are placed (equally spaced) on the unit circle. The interaction between 
two neurons can then be written as a function of the real part of the product of 
these vectors indicating the state of the two neurons. The Q-Potts neuron states 
can be represented by (Q — 1) dimensional vectors that are placed on the edges of 
a regular {Q — 1) dimensional simplex and the interaction between two neurons is 
then a function of the scalar product of these vectors, which is either {Q — l)/Q 
or 1/Q. For Q = 2 the three types of neurons are the same after proper rescahng, 
and for Q = 3 the phasor and the Potts neurons are equivalent. 

The neural network models built with these multi-state neurons have an imme- 
diate analogon in spin-glass systems (cfr., e.g., [12| and |13|, respectively, |14|, 
lUSl ). Of course, these types of multi-state neurons do not exhaust all possibilities 
for constructing models. We also mention the recently considered Ashkin-Teller 
and Blume-Emery-Griffiths neural network models that are based again upon their 
spin-glass counterparts (see, e.g., [16|, respectively, |17| and references therein), 
because, as we will argue, they are especially relevant for modeling more sophis- 
ticated features of real biological networks and/or from an information theoretic 
point of view. 

Besides these neuron states one also needs to specify an architecture indicating 
how the neurons are connected with each other. Several architectures have been 
studied in the literature for different purposes. From a practical applications point 
of view mostly perceptrons or, more general, layered feedforward networks are 
used since a very long time. Fully connected attractor networks with symmetric 
couplings, like the Hopfield model, satisfy the detailed balance principle and hence 
a Hamiltonian can be defined. The behaviour of such a network can then be studied 
by focusing on this Hamiltonian. An important feature of these attractor networks 
is the occurrence of feedback iTS^. Diluted architectures where only a fraction 
of the neurons are connected, are relevant both from the biological point of view 
and to model the breakdown of synaptic couplings causing loss of information. 
In particular, symmetrically extremely diluted models still allow for a Hamiltonian 
description but some feedback survives. Asymmetrically extremely diluted models 
are considered because their dynamics can be solved exactly since there are no 
feedback correlations. 

Finally, one needs to give an explicit learning rule for the couplings (e.g, Hebb 
(19 1, pseudo-inverse |20|) or a strategy to find the couplings giving the best perfor- 
mance (Gardner method |21 1, |22|). 

For a more complete overview of the field from a physics point of view we 
refer to the textbooks |23 1-|29 1, and to ll3()l-ll33l. 

Here we review some of the most recent results on multi-state neural networks. 
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In particular, we focus on the models with Q-Ising symmetry, i.e., the Q-Ising 
network mainly with Q = 3, and the Blume-Emery-Griffiths network. Both the 
dynamical time evolution and the thermodynamic and retrieval properties for these 
models with various architectures and a Hebb-type learning rule are discussed. 

The methods used are standard by now but have to be slightly extended to 
accomodate the multi-state character of the neuron. First, in order to study the 
time evolution under parallel updates of the neurons we mainly use the signal-to- 
noise analysis. There exist different versions of this method in the literature, e.g., 
lEl-ESl (see |28| for further references), EO), ED- 

In more detail, splitting the local field of the model in a signal part from the 
condensed patterns and a noise part from the rest of the patterns, and employing 
systematically the law of large numbers (LLN) and the central limit theorem (CLT) 
we derive the evolution of the distribution of the local field at every time step. 
This allows us to obtain a recursive scheme for the evolution of the relevant order 
parameters in the system. The details of this approach depend in an essential way 
on the architecture because different temporal correlations are possible. 

For extremely diluted asymmetric [42 1- 1.51 J (and references therein) and lay- 
ered feedforward architectures ll52l - ll56l (and references therein) recursion rela- 
tions are obtained in closed form directly for the relevant order parameters. This 
has been possible because in these types of networks there are no feedback corre- 
lations as time progresses. As a technical consequence the local field contains only 
Gaussian noise leading to an explicit solution. 

For the parallel dynamics of networks with symmetric connections, however, 
things are quite different |2|, [40|, |41|, [57|-[59| (and references therein). Even 
for extremely diluted versions of these systems [60|- ll63l (and references therein) 
feedback correlations become essential from the second time step onwards, com- 
plicating the dynamics in a nontrivial way. Therefore, explicit results concerning 
the time evolution of the order parameters for these models have to be obtained 
indirectly by starting from the distribution of the local field. Technically speaking, 
both for the symmetrically diluted and fully connected architectures the local field 
contains both a discrete and a normally distributed part. In both cases this discrete 
part prevents a closed-form solution of the dynamics for the relevant order param- 
eters. Nevertheless, the development of a recursive scheme is possible in order to 
calculate their time evolution. 

By requiring through these recursion relations that the local field becomes time- 
independent implying that most of the discrete noise part is neglected, we can 
obtain stationary equations for the order parameters. 

Since no closed-form solution of this dynamics is possible and the results are 
technically complicated, it is worthwhile to apply an alternative method, the gen- 
erating functional approach ll64l . l65l (for a recent review see, e.g., |66| and refer- 
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ences therein) to solve this feedback dynamics. This approach enables one to find 
all relevant physical order parameters at any time step via the derivation of the gen- 
erating functional. Comparing this approach with the signal-to-noise ratio analysis 
and with numerical simulations it turns out [59.] that beyond the third time step 
of the dynamics the signal-to-noise analysis, as applied in the literature mentioned 
above is not completely correct for those parameters of the system corresponding 
to spin-glass behaviour. The full details of this, showing that a technical assump- 
tion concerning the feedback correlations is not valid, although it has little effect 
in most of the retrieval region of the networks, are worked out first for the simpler 
Q = 2 Hopfield model ll67l and are beyond the scope of the present overview. 

Secondly, the fixed-point equations for the symmetric models, which are gov- 
erned by an Hamiltonian, can also be derived using thermodynamic replica mean- 
field theory fTTTl . l68l . This allows us to write down an expression for the free 
energy and obtain from it fixed-point equations for the order parameters. Ther- 
modynamic and retrieval properties, e.g., the maximal storage capacity, can be 
discussed through the appropriate phase diagrams. Most results in the multi-state 
literature treat models with sequential updating in the replica-symmetric approx- 
imation, e.g., I69ll - I73l (and references therein) for the Q-Ising and 11741 for the 
Blume-Emery-Griffiths model. These works use Hebb-type learning rules for the 
couplings. To obtain the optimal storage capacity by finding the optimal couplings 
which give the best performance of the network for a specific set of patterns, the 
Gardner-approach can be used to these models. This method treats the couplings 
as dynamical variables and by using a replica analysis the minimal volume frac- 
tion of coupling space is calculated ensuring that this specific set of patterns can 
still be embedded in the network with a certain basin of attraction. Results for 
multi-state networks with Q-Ising type neurons can be found, e.g., in f75l - l77l 
(and references therein). 

We remark that the Ashkin-Teller neural network briefly mentioned above will 
not be discussed here. For recent results on this model and its relation to other 
networks we refer to iTTSl. l79l and to ISOl. 

The rest of this contribution is organized as follows. In Sections 2 and 3 we 
consider the Q-Ising model, respectively the Blume-Emery-Griffiths (BEG) model. 
Each Section is divided in 3 subsections. Subsection 1 defines the model, its dy- 
namics, its relevant order parameters and its measures for the retrieval quality. 
In subsection 2 we use the signal-to-noise analysis in order to derive a recursive 
scheme for the evolution of the distribution of the local field, leading to recursion 
relations for the order parameters. The differences between the various architec- 
tures are outlined. Subsection 3 discusses the statics of the model describing the 
phase diagrams, focusing on the retrieval properties. In Section 4 we briefly de- 
scribe the results of a Gardner approach to these models. Finally, a short conclusion 
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is given in Section 5. 

This review is limited in both scope and length so that some details and/or 
contributions could not be mentioned. They are referred to, directly or indirectly, 
in the references. 

2 Q-Ising neural networks 
2.1 The model 

Consider a neural network consisting of N neurons which can take values ai,i = 
1, . . . ,N from a discrete set S = {—1 = si < S2 < . . . < sq = +1}. The p 
patterns to be stored in this network are supposed to be a collection of independent 
and identically distributed random variables (i.i.d.r.v.), {^f G S}, /j, = {1,. . . ,p}, 
with zero mean, E{^j^] = 0, and variance A = Var[^|']. The latter is a measure 
for the activity of the patterns. We remark that for simplicity we have taken the 
patterns and the neurons out of the same set of variables but this is no essential 
restriction. Given the configuration <tn = {(^j (*)}) J = {!> • • • j N}, the local field 
in neuron i equals 

N 

hi{<TN{t)) = Y,Jij{t)aj{t) (1) 

with Jij the synaptic coupling from neuron j to neuron i. In the sequel we write 
the shorthand notation hN,i{t) = hi{o-]^{t)). 

It is clear that the Jij explicitly depend on the architecture. For the extremely 
diluted (ED), both symmetric (SED) and asymmetric (AED), and the fully con- 
nected (EC) architectures the couplings are time-independent and the diagonal 
terms are absent, i.e. Ju = 0. The configuration crN{t = 0) is chosen as in- 
put. For the layered feedforward (LF) model the time dependence of the couphngs 
is relevant because the set-up of the model is somewhat different. There, each neu- 
ron in layer t is unidirectionally connected to all neurons on layer t + 1 and Jij (t) 
is the strength of the coupling from neuron j on layer t to neuron i on layer t + 1. 
The state ctn (t + 1) of layer t -|- 1 is determined by the state a n (t) of the previous 
layer t. 

In all cases the couplings are chosen according to the Hebb rule such that we 
can write 

J^T = ^t^'^J f"'- ^^^^ 

/i=i 
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4^ it) = ]^Eer(t + i)e;w, (4) 

fj.=i 

with the {cjj = 0,l},i,j = 1, . . . , chosen to be i.i.d.r.v. with distribution 
Pr{cjj = x} = {I — C/N)6xfl + {C/N)5x,i and satisfying for symmetric dilution 
Cij = Cji, Cii = 0, and for asymmetric dilution that Cij and cji are statistically 
independent (with cu = 0). 

All neurons are updated in parallel through the spin-flip dynamics defined by 
the transition probabihties 

Here the energy potential ej[s|<T^] is defined by 1691 

ei[s\aN] = -^[hi{(TN)s - bs'^] , (6) 

where 6 > is the gain parameter of the system. The zero temperature limit 
T = (3^^ ^ of this dynamics is given by the updating rule 

ai{t) ai{t + 1) = Sfc : min ei[s|cr7v(t)] = ej[sfe|<TAr(t)] . (7) 

This updating rule is equivalent to using a gain function gfe(-), 

ai{t + l) = gbihNAt)) 
Q 

E.bi^) = ^Sk[d Hsk+i + Sk) - x] - e [b{sk + Sfc-i) - x]] (8) 

k=l 

with So = — oo and sq+i = +oo. For finite Q, this gain function g^(-) is a step 
function. The gain parameter b controls the average slope of gf,(-). 

In order to measure the retrieval quality of the system one can use the Hamming 
distance between a stored pattern and the microscopic state of the network 

d{e,cT^{t))^^Y.i^l'-a.{t)f. (9) 

i 

This introduces the main overlap and the arithmetic mean of the neuron activities 
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We remark that for Q = 2 the variance of the patterns A = 1, and the neuron 
activity a{t) = 1. For the LF architecture we recall that depends on t. 

In this overview we mainly consider the patterns to be uniformly distributed 
(e.g., A = 2/3 for Q = 3). For low-activity networks (A small, e.g., A << 2/3 
for Q = 3) a better measure for the retrieval quality is the mutual information. We 
refer to the literature for a further discussion of this point ISTl - ISBll (and references 
therein). 

2.2 Solving the dynamics 
2.2.1 Correlations 

We first discuss some of the geometric properties of the various architectures which 
are particularly relevant for the understanding of their long-time dynamic behaviour. 

For a FC architecture there are two main sources of strong correlations between 
the neurons complicating the dynamical evolution : feedback loops and the com- 
mon ancestor problem | Feedback loops occur when in the course of the time 
evolution, e.g., the following string of connections is possible: i — > j — > A; — > i. 
We remark that architectures with symmetric connections always have these feed- 
back loops. In the absence of these loops the network functions in fact as a layered 
system, i.e., only feedforward connections are possible. But in this layered archi- 
tecture common ancestors are still present when, e.g., for the sites i and j there are 
sites in the foregoing time steps that have a connection with both i and j. 

In AED architectures these sources of correlations are absent. This class of 
neural networks was introduced in connection with Q = 2-Ising models f4T\. We 
recall that the couplings are then given by eq. ^ and that in the limit N ^ oo two 
important properties of this network are essential L42J . L84J . The first property is 
the high asymmetry of the connections, viz. 

Pt{c,, = c,i} = (J^y , Pr{c,,=lAc,, = 0} = ^(^l-^). (11) 

Therefore, almost all connections of the graph G]\f{c) = {(hj) '■ Cij = j 7^ 
« = !,..., N} are directed : Cij 7^ cji. The second property in the limit of extreme 
dilution is the directed local Cayley-tree structure of the graph Gn{c). By the 
arguments above the probability that k connections are directed towards a given 
site i becomes a Poisson distribution in the limit of extreme dilution and the mean 
value of the number of in(out) connections for this site i is C. The probability 
that the sites i and i' have site j as a common ancester is obviously C/N, hence 
the probability that the sites i and i' have disjoint clusters of ancestors approaches 
(1 - C^Nf" ~ exp{-C^^/N) for iV > 1. 



7 



So we find that in tlie limit of extreme dilution almost all (i.e. with probability 
1) feedback loops are eliminated, and any finite number of neurons have almost 
all disjoint clusters of ancestors. So we first dilute the system by taking N —>■ oo 
and then we take the limit C ^ oo in order to get infinite average connectivity 
allowing to store infinitely many patterns p. 

This implies that for this AED model at any given time step t all spins are 
uncorrected and, hence, the first step dynamics describes the full time evolution of 
the network. 

For the SED model the architecture is still a local Cayley-tree but no longer 
directed and in the limit ^ oo the probability that the number of connections 
giving information to the the site i, is still a Poisson distribution with mean C. 
However, at time t the spins are no longer uncorrected causing a feedback from 
t > 2 onwards L6JJ, L62J. 



2.2.2 First time step 

In order to solve the dynamics we start with a discussion of the first time step 
dynamics, the form of which is independent of the architecture. So consider a 
EC network. Suppose that the initial configuration of the network {ai{0)} is a 
collection of i.i.d.r.v. with mean E[crj(0)] = 0, variance Var[(Tj(0)] = oq, and 
correlated with only one stored pattern, say the first one {£,}}: 

(0)] =<5, J Vi"io^ ml>0. (12) 

This pattern is said to be condensed. By the law of large numbers (LLN) one gets 
for the main overlap and the activity att = 

mHo) ^ lim m]v(0) ^ iE[4Vi(0)] = (13) 

N^oo A 

a(0) = lim a7v(0) = Efaf (0)] = oq (14) 

A''-+oo 

where the convergence is in probability (e.g., |85 1). In order to obtain the configu- 
ration at t = 1 we first have to calculate the local field © at t = 0. To do this we 
employ the signal-to-noise ratio analysis (see, e.g.,|40|, 1451 ). Recalling the learn- 
ing rule ^ we separate the part containing the condensed pattern, i.e., the signal, 
from the rest, i.e., the noise to arrive at 

hicr^m = il^T. e>.(o) + E E e;-.(o) (15) 

where a = p/N . The properties of the initial configurations (fT2t - (fT4l i assure 
us that the summation in the first term on the r.h.s of (fTSl converges in the limit 
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iV ^ oo to 

^^^^j:^]-m = mHO). (16) 

The first term ^lm^{0) is independent of the second term on the r.h.s of (fTSl . 
This second term contains the influence of the non-condensed patterns causing the 
intrinsic noise in the dynamics of the main overlap. In view of this we define the 
residual overlap 

-^i:5;.,W ,^1. (17) 

Applying the CLT to this second term in (fTSl we find 

1 

= V^Af{0,AD{0)) (19) 




where the quantity M{0, V) represents a Gaussian random variable with mean 
and variance V and where D{0) = Var[r'^(0)] = a(0). Thus we see that in fact the 
variance of this residual overlap, i.e., D{t) is the relevant quantity characterising 
the intrinsic noise. 

In conclusion, in the limit — > oo the local field is the sum of two independent 
random variables, i.e. 

hi{0) = lim /i7V,i(0) = S,}m^{0) + ^/^M{0,a{0)). (20) 



At this point we note that the structure (1201 1 of the distribution of the local field at 
time zero - signal plus Gaussian noise - is typical for all architectures discussed 
here because the correlations caused by the dynamics only appear for t > 1. Some 
technical details are different for the various architectures. The first change in 
details that has to be made is an adaptation of the sum over the sites j to all i for 
the LF architecture and to the part of the tree connected to neuron i which has mean 
C, in the ED architectures. The second change is that for the diluted architectures 
an additional limit C —>■ (yo has to be taken besides the N —>■ oo limit. So in the 
thermodynamic limit C,N —>■ (yo all averages will have to be taken over the treelike 
structure, viz. J2i ~^ ^ J2i&ree- Furthermore a = p/N has to be replaced by 
a = p/C. 
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2.2.3 Recursive dynamical scheme 

The key question is then how these quantities evolve in time under the parallel 
dynamics specified before. For a general time step we find from the LLN in the 
limit C, — > oo for the main overlap and the activity dTOl 

mHt + l) i((ei(a.(t + l))^)), a{t + l) + 1))2 )) (21) 

with the thermal average defined as 

(/((Ti(t + l)))/3 - 1 —- — (22) 

Eaes exp[2/3 crihiit) - ha)\ 

where hi{t) = limiv->oo ^Af,i(i)- In the above ((•)) denotes the average both 
over the distribution of the embedded patterns {^f } and the initial configurations 
{(Tj(0)}. The average over the latter is hidden in an average over the local field 
through the updating rule In the sequel we focus on zero temperature. Then, 
using eq. ^ these formula reduce to 

m\t+i) \mg,{hm)). <t+i) {{gUhitm- (23) 

As seen already in the first time step, we have to study carefully the influence of 
the non-condensed patterns causing the intrinsic noise in the dynamics of the main 
overlap. The method used to obtain these order parameters is then to calculate 
the distribution of the local field as a function of time. In order to determine the 
structure of the local field we have to concentrate on the evolution of the residual 
overlap. The details of this calculation are very technical and depend on the precise 
correlations in the system and hence on the architecture of the network |21, 1311 . 
|45J . [531, |60|. Here we give a discussion of the results obtained. 

In general, the distribution of the local field at time t + 1 is given by 

hi{t + l) = ^jm\t + l)+Af{0,aa{t + l)) + x{t)[F{hi{t)-^lm\t)) + Baai{t)] 

(24) 

where F and B are binary coefficients given below, which depend on the specific 
architecture. From this it is clear that the local field at time t consists out of a 
discrete part and a normally distributed part, viz. 

hi{t) = M,{t)+M{0,V{t)) (25) 

where Mi{t) satisfies the recursion relation 

Mi{t + 1) = x{t)[F{Mi{t) - iWii)) + Baai{t)] + eim\t + 1) (26) 
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and where V{t) = aAD{t) with D{t) itself given by the recursion relation 



D{t + 1) = ^^^±11 + Lx\t)D{t) + 2B2x{t)Cov[r>'{t),r^^{t)] (27) 
where L and B2 are again coefficients specified below. The quantity x{t) reads 



X{t) = fh''(t)(^(^k+l + Sk)){Sk+l - Sk) 



k=l 



where is the probabihty density of hf{t) = liiiiTv^oo hp^ i{t) with 



(28) 



Furthermore, f^^{t) is defined as 



1 



= Ji-^T7^ECg.(/^^^.(^))• 



(29) 



(30) 



At this point we remark that we made the technical assumption that (Ti{t) and 
h'^ i{t) are only weakly correlated in the limit N —>■ 00 such that f^(t) converges 
to a normal distribution. Finally, as can be read off from eq. (l26l the quantity Mi{t) 
consists out of the signal term and a discrete noise term, viz. 



t-2 



Mi{t) = i}m\t) + Biax{t - l)ai{t - I) + B2Y, a 

t'=0 



n x{s) 

s=t' 



Tiit'). (31) 



Since different architectures contain different correlations not all terms in these 
final equations are present. In particular we have for the coefficients F, B,L,Bi 
and B2 introduced above 





F 


B 


L 


Bi 


B2 


FC 


1 


1 


1 


1 


1 


SED 





1 





1 





LF 


1 





1 








AED 


















(32) 



with B indicating the feedback caused by the symmetry in the architectures and L 
the common ancestors contribution. 

At this point we remark that in the so-called theory of statistical neurodynamics 
lEH, ||35jj, I18IJ one starts from a different approximate local field by leaving out 
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any discrete noise (the term in o"j(t)). As a consequence the covariance in the re- 
cursion relation for D{t) can be written down more explicitly since only Gaussian 
noise is involved. For more details we refer to f28l, (6Tl, ll86l . 

We still have to determine the probability density of ff^i^^^^^ in eq. ( EHl . which in 
the thermodynamic limit equals the probability density of fh^^t)- This can be done 
by looking at the form of Mj(t) given by eq. ST\\ . The evolution equation tells us 
that ai{t') can be replaced by gb{hi{t' — 1)) such that the second and third terms of 
Mi{t) are the sums of stepfunctions of correlated variables. These are also corre- 
lated through the dynamics with the normally distributed part of hi{t). Therefore 
the local field can be considered as a transformation of a set of correlated normally 
distributed variables Xg, s = 0, . . . , t — 2, t, which we choose to normalize. Defin- 
ing the correlation matrix W = {p{s,s') = E[xsXs']) we arrive at the following 
expression for the probability density of the local field at time t 

fh^(t){y) = J n dxsdxt 6(^y- M,{t) - ^aAD{t) xt 



=0 

1 / 1 



exp ( -^xl^~^x^ ) (33) 



^det(27rVF) V 2' 
with X = (xo, . . .xt-2,xt). For the symmetrically diluted case this expression 



simplifies to 

W2] 

fh,it){y) = y n dxt~2s S[y- Cjm^{t) - ax{t)ai{t) - ^aa{t)xt 



^ exp f-^xl^^^x^ ) (34) 



with X = ({xs}) = (xj_2[t/2] ) • • • xt-2,xt). The brackets [t/2] denote the integer 
part of t/2. 

So the local field at time t consists out of a signal term, a discrete noise part and 
a normally distributed noise part. Furthermore, the discrete noise and the normally 
distributed noise are correlated and this prohibits us to derive a closed expression 
for the overlap and activity. 

Together with the eqs. d23l for m}{t + 1) and a{t + 1) the results above form 
a recursive scheme in order to obtain the order parameters of the system. The 
practical difficulty which remains is the explicit calculation of the correlations in 
the network at different time steps as present in eq. d27b . 

For AED and LF architectures this scheme leads to an explicit form for the 
recursion relations for the order parameters 

m^(t + 1) = ^ (^{e{t + 1) / Vzg,{e{t + l)m\t) + ^aAD{t)z)jj (35) 
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a{t + 1) = VzgHeit + l)m\t) + ^aAD{t)z)jj (36) 
D{t + 1) = + ^ Vzzg,{e{t + l)m\t) + ^aAD{t)z)jj 

(37) 

with Vz = dz(27r)^^/^ exp(— For the AED architecture {L = 0) the second 
term on the r.h.s. of d37l coming from the correlations caused by the common 
ancestors is absent. For the LF architecture we remark that this explicit solution 
requires an independent choice of the representations of the patterns at different 
layers. 

At finite temperatures analogous recursion relations for the AED and LF net- 
works can be derived l45l . ll53l by introducing auxiliary thermal fields 1F71 in 
order to express the stochastic dynamics within the gain function formulation of 
the deterministic dynamics. These recursion relations can be solved numerically 
and the stationary limit can be discussed (see Section 2.2.4). Furthermore, dam- 
age spreading ll42l . ll88l . ll89l . i.e., the evolution of two network configurations 
which are initially close in Hamming distance can be studied 103, ||53|. Finally, 
a complete self-control mechanism can be built in the dynamics of these systems 
by introducing a time-dependent threshold in the gain function improving, e.g., the 
storage capacity, the basins of attraction of the embedded patterns and the mutual 
information content ED-US, ||90l 

For the symmetric networks explicit examples of the dynamical scheme above 
and a comparison with numerical simulations have been presented in [2] for the 
EC and in [60| for the SED model with equidistant states and a uniform distribu- 
tion of the patterns. By using the recursion relations the first few time steps are 
written out explicitly and studied numerically. These results are compared with the 
literature \M\, L35|, |40|, |41|, |61|, |62|, |81|, |91|-|95| where the feedback cor- 
relations for t > 2 are neglected or approximated in different ways. In the whole 
retrieval region of these symmetric networks it is found that the first four or five 
time steps calculated by the scheme presented above give already a clear picture 
of the time evolution. Explicit results depend of course on the specific values of 
the model parameters, e.g., the storage capacity a, the initial overlap niQ with the 
embedded pattern, the initial neural activity oq, the value of the gain parameter b. 
Furthermore, numerical simulations provide good support for this scheme, but very 
recently we have discovered some small deviations, especially close to the border 
of retrieval which can not be entirely due to finite size effects. This has been com- 
pletely understood recently by carefully studying the long time correlations and the 
details are being worked out (see Wt\ . l96 l and Section 3.2). 
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2.2.4 Stationary limit: thermodynamic and retrieval properties 

Equilibrium results for the AED and LF Q-Ising models are obtained immediately 
by straightforwardly leaving out the time dependence in (B5l-(IT7l (cfr. B31 . I5^ '). 
since the evolution equations for the local field and the order parameters do not 
change their form as time progresses. This still allows small fluctuations in the 
configurations {fJj}. The difference between the fixed-point equations for these 
two architectures is that for the AED model the variance of the residual noise, 
D{t), is simply proportional to the activity of the neurons at time t while for the 
LF model a recursion is needed. 

A lot of detailed results are available on capacity-gain parameter and temper- 
ature capacity diagrams obtained by numerically solving these equations. In gen- 
eral, it is necessary to distinguish three different types of solutions. The zero solu- 
tion, Z, is determined by m = as well as a = 0. A sustained activity solution, 
S, is defined by m = but a ^ 0. Finally, there are solutions with both m / 
and a 7^ 0. Nonattracting solutions of the last type are denoted by NR (for non- 
retrieval), attracting ones by R (for retrieval). As a typical illustration we show 
fig. 1. 

For the AED architecture it is important to observe that, for zero temperature, 
in the retrieval regime, R is never the only attractor in the (m, a) plane. Its basin of 
attraction is always limited by at least one attractor on the axis m = 0. In contrast, 
in the case of analog neurons (piecewise linear networks) the retrieval solution is 
an attractor for the whole (m, a) plane. Furthermore, at any fixed a, a value of 
b can be determined where the Hamming distance of R is minimal. The line of 
these optimal b is indicated by OPT. It is close to 1/2 for T = and shifts 
completely to 6 = with increasing temperature. Finally, for a finite Q network 
two arbitrarily close configurations always repel each other even in the retrieval 
regime. For analog networks there exists a transition line in the capacity-gain plane 
below which no such "chaotic" behavour occurs. 

For the LF architecture at zero temperature, in contrast to the AED case, the 
retrieval state is always accompanied by an attractor which has zero overlap with 
the embedded pattern. In all cases under consideration the retrieval state disappears 
discontinuously as the storage capacity a increases. Finally, a type of chaoticity 
in the network trajectories is always present for arbitrary finite Q. However, in the 
case of a piecewise linear gain function there exists a dynamical transition towards 
chaos in the (a, 6)-plane. The (a, 6)-region where chaos does occur is relatively 
smaller than in the corresponding AED networks. For further results and especially 
for results at finite temperature, we refer to the literature mentioned before. 

Next, for the SED and EC architectures the evolution equations for the order 
parameters do change their form as time progresses by the explicit appearance of 
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Figure 1 : The {a - b) diagram for the Q = 3 AED (top) and LF (bottom) network with 
uniform patterns at T = 0. The curve ac{h) denotes the boundary of the retrieval region. 
The curve as{h) is the lower bound for the existence of the sustained activity states. The 
full line denotes a second-order transition, the dashed line a first order one. The line OPT 
is the line of best retrieval quality. The structure of the retrieval dynamics is explained: a 
denotes an attractor, s a saddle-point, r a repeUor. 
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the {ai{t')},t' = 1, . . . , t term. Hence we can not use the simple procedure above 
to obtain the fixed-point equations. Instead we derive the equilibrium results of 
our dynamical scheme by requiring through the recursion relations d24b that the 
distribution of the local field becomes time-independent. This is clearly an approx- 
imation because fluctuations in the network configuration are no longer allowed. 
In fact, it means that out of the discrete part of this distribution, i.e., Mj(t) (recall 
dSTt ). only the ai{t — 1) term is kept besides, of course, the signal term. This pro- 
cedure implies that the main overlap and activity in the fixed-point are found from 
the definitions (flOl and not from leaving out the time dependence in the recursion 
relation d23b . 

We start by eliminating the time-dependence in the evolution equations for the 
local field d24t . This leads to 

h^ = + [X"1~W(0, aa) + [rT^aX<^i (38) 

with x°"^ = 1 — Fx being 1 for the SED and 1 — x for the FC model and hi = 
limt^oo hi{t). This expression consists out of two parts: a normally distributed 
part hi = M{S}w} ,aa/[x°'^']'^) and some discrete noise part. The discrete noise 
comes from the correlations of the {(Ti{t)} at different time steps (here only the 
preceding time step is considered) and is inherent in the SED and FC dynamics. 
Employing this expression in the updating rule ^ one finds 

= uihi + [rT^o^x<yi) ■ (39) 

This is a self-consistent equation in which in general admits more than one 
solution. These types of equation have been solved in the literature in the context 
of thermodynamics using a geometric Maxwell construction ll39l . l97l . We remark 
that for analog networks the geometric Maxwell construction is not necessary: the 
fixed-point equation (l39l has only one solution fSF]. 
This approach leads to a unique solution 

a^ = g-,{hi), h = h-[2rT^ax. (40) 

We remark that plugging this result into the local field (I38t tells us that the prob- 
ability distribution of the local field contains {Q — 1) gaps. This gap structure 
also depends on the architecture and the most important findings are that dilution 
changes the regions of existence of these gaps but not their width. Moreover, the 
gaps become typically much bigger when crossing the border of retrieval 1981 - 

iirmi . 

Using the definition of the main overlap and activity (flOl in the limit N ^ oo 
for the FC model and limit C, N ^ oo for the SED model, one finds in the fixed 
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From d27l and d28l one furthermore sees that 

D = [TT^a/A 

with 

'= = ^((/^"^'(«'"' + ^ 

These resulting equations (l4n) - d43l obtained through parallel dynamics turn out to 
be the same as the fixed-point equations derived from a replica-symmetric mean- 
field theory treatment discussed next. 

For symmetric networks (FC and SED) we consider the long time behaviour 
governed by the Hamiltonian 



(41) 
(42) 

(43) 
(44) 
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(45) 



with J"^ given by ^ for the FC and by Q for the SED model. The neurons are 
updated asynchronously according to the transition probability (ISll-©. In order to 
calculate the free energy we use the standard replica method ITTl . ll68l . We re- 
mark that for the SED architecture, we employ the replica method as applied to 
dilute spin-glass models 1 101J- L104 I. Starting from the replicated partition func- 
tion averaged over the connectivity and the non-condensed patterns and assuming 
replica symmetry, we arrive at the free energy which can be written down for 
a variable dilution c = C jN with c between (SED) and 1 (FC) 
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with s the number of condensed patterns which we take to be 1 as before. 
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with q the Edwards-Anderson spin-glass order parameter and X = P {{ (c^) — i^)'^)) 
the susceptibility (defined before in Section 2.2.3 for zero temperature) in the sta- 
tionary limit. We remark that the effective gain parameter b can be negative, im- 
plying that the input-output function reduces to that of 2-Ising-type neurons, i.e., 
9bih) = sign(/i). 

The phase structure of the network is determined by the solution of the fixed- 
point equations for the order parameters 



jDz{a{zf)j^ (49) 
X = -^(( f Dzz{a{z)))) (50) 



which maximize —f3f{P). Here 

Tr„aexp[Pa{'£^mf,C^' + z^ ^. ^ 

{a{z)) = ^ ; — = . (51) 

Trs exp[ /?s {J2f_i "mfiS,^ + z^Jarc — 6s)] 

Explicit expressions for these fixed-point equations for Q = 3, 4 and Q = oo can 
be found in |69J, L70J, t73J for the EC and SED model and for Q = 3 in ||63J for 
variable dilution. 

A lot of detailed results are available on the corresponding phase diagrams. 
Some typical results are shown in fig. 2. In general, we can distinguish a retrieval 
phase (m ^ 0), a spin-glass phase (m = 0, g > 0) and a paramagnetic phase (m = 
0, g = 0). Eor the EC architecture at zero temperature the results are extremely 
dependent on the pattern activity. In the case of uniformly distributed patterns 
(yl = 2/3) we see that different retrieval regions show up for small h (the retrieval 
region II does not appear for ^4 < 1/3) and the capacity is reduced by a factor 
compared with the capacity for the AED and LE architectures. The line of optimal 
Hamming distance is given exactly by 6 = 1/2 (fig 2 left); in the AED model we 
recall that it is located for the whole retrieval region in the interval b G [0.4, 0.5] 
(see fig. 1 top) while in the LE model it bends to smaller b for growing a (see fig. 1 
bottom). We remark that for Q = oo the diagram for the EC and LE models are 
very similar in shape but the capacity is reduced roughly by a factor of 10 in the 
former. Eor non-zero temperatures the situation is complicated and depends very 
much on the value of b. Eor b close to and greater than the optimal 6=1/2 the 
phase diagram is completely different from that of the Hopfield model in the sense 
that the paramagnetic phase exists between the retrieval and the spin-glass phase 
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Figure 2: The (a - 6), T = (left) and (a - T), 6 = i (right) phase diagram for the FC 
Q = 3-lsing model with uniform patterns. The (thin) full curve represents the boundary 
of the retrieval region, the thick full curve the thermodynamic transition of the retrieval 
state, the long-dashed curve the spin-glass transition, and the dotted curve the optimal gain 
parameter The land 11 indicate two retrieval regions: inregionlr « 0(1), while in region 
11 r w 0(10). The chain curve (very close to the a-axis on the right)is the AT-line. The 
short dashed curve indicates the border above which no paramagnetic states exists. 

(see fig. 2 right). We remark that for Q = cxd the diagram is relatively simple again 
and qualitatively resembles that of the Hopfield model. 

For the SED architecture there are interesting similarities with the AED model. 
In fact, we find that the (q — b) phase diagram in fig. 3 is tilted towards higher 
6-values in comparison with fig. 1 (top) because of the presence of an extended 
2-Ising-like region. The critical boundary of this region is independent of Q. The 
(a — r) diagram of the Q = 3 and Q = oo models are qualitatively similar. 

For variable dilution (all values of c G [0, 1]) one finds some architecture in- 
dependent properties for a — > 0, e.g., the optimal value of b being b = 1/2 for 
T = 0. The main dependence of the behaviour of the network on the connectivity 
arises for finite a. An interesting property is the suppression of the discontinuous 
boundary between the retrieval regions I and II (see fig. 2 left) with decreasing con- 
nectivity, disappearing completely for c w 0.63, making the optimal performance 
domain readily accessible to a wide region of network parameters 1 63 J. 

Finally, the stability of the replica symmetric retrieval solution against replica- 
symmetry breaking can be determined by studying the replicon eigenvalue [68 1 
II105I . leading to the de Almeida-Thouless (AT) stability line indicating the tem- 
peratures below which the replica-symmetric approximation is no longer valid. For 
more details we refer to the figures shown and to the literature mentioned before. 
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Figure 3: The {a — b) phase diagram for the Q = 3 SED model with uniform patterns 
at T = o. The (thin) full and long-dashed curves denote the boundary of the retrieval 
region corresponding respectively to a continuous and discontinuous appearance of the 
solution. The dotted curve separates the 2-Ising-like retrieval region. The short-dashed 
curve indicates the discontinuous spin-glass transition. The thick full curve represents the 
thermodynamic transition for the retrieval state. 

3 BEG neural networks 
3.1 The model 

In Section 2.1 it has been mentioned that the mutual information fl061, P1071 is 
the most appropriate concept to measure the retrieval quality for sparsely coded 
networks. A natural question is then whether one could use the mutual informa- 
tion in general in a systematic way to determine a priori an optimal hamiltonian 
guaranteeing the best retrieval properties including, e.g., the largest retrieval over- 
lap, loading capacity, basin of attraction, convergence time, for an arbitrary scalar 
valued neuron (spin) model. Optimal means especially that although the network 
might start initially /ar from the embedded pattern it is still able to retrieve it. 

This question can be answered positively |50|, fToE] by presenting a general 
scheme in order to express the mutual information as a function of the relevant 
macroscopic parameters like, e.g., overlap with the embedded patterns, activity, 
. . . and constructing a hamiltonian from it for general Q-state neural networks. For 
Q = 2, one finds back the Hopfield model for biased patterns ensuring that this 
hamiltonian is optimal in the sense described above. For Q = 3, one obtains a 
Blume-Emery-Griffiths (BEG) type hamiltonian l50ll named after the BEG spin- 
glass flTl . 
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This BEG-model for a FC architecture can then be descibed as follows. Con- 
sider a neural network consisting of N neurons which can take values ai,i = 
1, . . . ,N from the discrete set S = {—1,0, +1}. The p = aN patterns to be 
stored in this network are supposed to be i.i.d.r.v., {^f }, ^ = 1, . . . ,p with a prob- 
ability distribution 

PiO = - 1) + I'JCer + 1) + (1 - a)6{0 (52) 

with a the activity of the patterns so that 

(We remark that for reasons of convenience the pattern activity in this Section is 
now indicated with a and not with A as in the Q-Ising Section.) 

Given the network configuration at time t, crj\[{t) = {crj{t)},j = 1,. . . ,N, 
the following dynamics is considered. The configuration crjv(O) is chosen as input. 
All neurons are updated in parallel according to the rule Q at zero temperature or 
the transition probability ^ at arbitrary temperature. But, here the energy potential 
ei[s\aj\[{t)] is different from ^ and defined by 

ei{s\aN{t)) = -shi{aN{t)) - ^^^^(cr^vW) , (54) 
where the following local fields in neuron i carry all the information 

with the obvious shorthand notation for the local fields. The synaptic couplings Jjj 
and Kij are of the Hebb-type 

^^. = ^ECc^ ^^. = ^E« (56) 

with 

The first part is the usual rule in a three-state network (recall Section 2.1) that 
codifies the patterns, while the second part can be considered as codifying the 
fluctuations of the binary active patterns (^f )^ about their average. That part is 
also consistent with the modified Hebb rule for the Hopfield model with biased 
patterns. The updating rule © is equivalent to using a gain function 

ai{t + 1) = g{hN,^{t), 9N,i{t)) = sign{hN,^{t))e{\hN,i{t)\ + OnA^)) (58) 
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with 6 the Heaviside function. 

The order parameters of this system have been obtained starting form the mu- 
tual information as a measure for the retrieval quality of the system |50j|, I108II . 
They are the retrieval overlap, the activity overlap, and the neural activity 

i i 

= ^E(^*W)'- (59) 

i 

(We remark that in this Section the neural activity is now denoted by q instead 
of a.) Instead of using the activity overlap n^^{t) itself it is more convenient to 
employ the modified activity overlap 

^Nit) = r3^K(*) - ^^w) = ^ (60) 

i 

This parameter can also be called fluctuation overlap since it can be viewed as the 
retrieval overlap between the binary states af{t) and the patterns r]^{t). It is, in 
general, independent of the retrieval overlap m^{t). It gives rise to new states, the 
so-called quadrupolar (or pattern-fluctuation retrieval) states with m = but / / 0. 
These states have a retrieval overlap zero but the activity overlap is not, meaning 
that the active neurons (±1) coincide with the active patterns but the signs are 
not correlated. Hence they carry some retrieval information and they might be 
important in practical applications. In pattern recognition, e.g., looking at a black 
and white picture on a grey background, these states would describe the situations 
where the exact location of the picture with respect to the background is known 
but, the details of the picture itself are not focused. Furthermore, these states might 
be helpful in modelling such focusing problems discussed in the framework of 
cognitive neuroscience |6|. 

The long-time behaviour of this network is governed by the following Hamil- 
tonian iBOl . 11081 . precisely obtained by optimizing the mutual information 

Since we want to compare this model with the 3-Ising model and we want to be 
able to change the relative importance of the two terms we rewrite the Hamiltonian 
as 

A ~ B ~ 

H = --Y^ Jijaiaj - ^ E ^^j'^i'^l ' (62) 
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with 

Jij = aJij , Kij = a(l - a)Kij . (63) 

For 

A = -, B= ^ (64) 
a o(l — a) 

we trivially recover the model above. When we now take Kij = b6ij and A = 
B = Iwe obtain the 3-state Ising model (recall eq. (l45l ). Finally, we find back the 
Hopfield model by taking first B = and then a = 1, again with A = 1. 

For the ED and LF architectures we have to adapt the Hebbian learning rule 
d56b analogously as in the Q-Ising model. For the ED case both Hebbian weights 
are multiplied with the factor CijN/ C, where we recall that cij is a random variable 
assuming values and 1 with mean C 0{lnN /N). For the LF architecture we 
consider 

Mt) = ^ E + K^At) = ^ E ^r(i + (65) 

We remark that an underlying assumption that leads to the BEG model and that 
should be preserved in any implementation is that the dynamic activity g a, as 
far as the order of magnitude is concerned. 



3.2 Solving the dynamics 

The discussion given in Section 2.2 on the correlations appearing for the various 
architectures remains valid for this model. Furthermore, the development of the 
recursive scheme presented there can be followed in order to study the time evolu- 
tion of the distribution of the local fields hi{t) and 6i{t). This allows one to write 
down recursion relations determining the full time evolution of the order parame- 
ters (|59ll-(|60ll of the model. 

Since the method has been explained already in some detail in Section 2.2 and 
the explicit analysis is even more technical we do not write it out here. For the FC 
architecture we refer to |58 1 for the treatment at zero temperature and to |59| for 
an extension to arbitrary temperatures. The final results are two recursion relations 
of the type studied in Section 2.2.3, one for hi{t) and one for 6i{t). 

Also for the BEG network the first few time steps of its evolution have been 
worked out analytically and have been compared with numerical simulations for 
systems up to iV = 7000 neurons averaged over 500 runs. As an illustration we 
refer to fig. 4 left presenting the order parameter / as a function of a for uniform 
patterns and initial conditions niQ = Iq = 0.6, qq = 0.5 and T = 0.5. We remark 
that the maximal capacity for this system is Oc ^ 0.06 (||74J). We then learn that 
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the first time steps agree very well and do give a reasonable estimate of the critical 
capacity. 




0.03 0.06 „ 0.09 0.12 0.15 



Figure 4: The BEG model on a FC architecture with uniform patterns. Left: Order param- 
eter l{t) as a function of the capacity a for the first three time steps with initial conditions 
Too = ^0 = 0.6, (7o = 0.5 at T = 0.5. Theoretical results (solid lines) versus simulations 
(time 1 , 2 and 3 represented by a circle, a plus respectively a times symbol) are shown. 
Right: Order parameters m{t) (bottom three lines) and l{t) (top three lines) as a function 
of time with initial conditions toq = Iq — 0.1, (/o — 0.5 at T = 1.1 and several values of 
a. Theoretical results (open symbols) versus simulations (full lines for a = 0.001, dashed 
lines for a — 0.01 and dotted lines for a = 0.1). 

In fig. 4 we examine the order parameters m and / in the quadrupolar phase 
(m = 0, / > 0) versus the paramagnetic phase (m = 0,1 = 0), for several values 
of a. We see that a few time steps do give us already the characteristic behaviour. 
When time increases m decreases while / differentiates between the phases, as is 
seen in the theoretical results as well as in the simulations. For the quadrupolar 
phase (a = 0.001) I increases, deep inside the paramagnetic phase (a = 0.1) / 
decreases, while in the intermediate region (a = 0.01) the rate of increase of / 
quickly diminishes and / itself goes to zero. 

At this point, we remark that there is a small but visible discrepancy between 
the theory and simulations especially in /(3). It is of the order 0(10^^) and at- 
tributed to finite-size effects. This, and the fact that the signal-to-noise approach 
does not give a closed form solution of the dynamics, has been a motivation to look 
at the generating functional approach to solve this dynamics. An extensive report 
is beyond the scope of the present review. Essentially it turns out [59 ,1 that beyond 
the third time step of the dynamics the signal-to-noise analysis as used above is 
not entirely correct for those parameters of the system corresponding to spin-glass 
behaviour. The reason is that the technical assumption after eq. d30b is not valid in 
the spin-glass region but it seems to have little effect in most of the retrieval region 
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of the networks 1591 (and l67ll for full details in the simpler case of the Hopfield 
model). 

To confirm this some further numerical experiments have been done for dif- 
ferent values of the model parameters comparing this limiting normal distribution 
(recall eq. ( BUb ) with simulations for different time steps. A comparison for time 
steps t = 2 and t = 9 for systems with N = 2000 neurons averaged over 250 
runs for the initial conditions niQ = Iq = 0.6, qq = 0.5, a = 2/3 and temperature 
T = 0.2 as a function of a shows that in the retrieval region (oc < 0.086) the 
simulation results coincide quite well with the limiting distribution, while in the 
spin-glass region, certainly from a ~ 0.11 onwards, the results for t = 9 start 
diverting systematically |59|. We remark that the signal to noise approach can be 
used correctly by refining that technical assumption allowing for the inclusion of 
all feedback correlations l67l . 

Concerning the other architectures we mention again that the AED and LF 
models can be solved exactly |50|, lISTl . ll56l . 11109^. and we study the stationary 
limit in the next subsection. Resulsts on the BEG model with variable dilution, 
hence, including SED can be found in ll96l . 

3.3 Thermodynamic and retrieval properties 

Stationary results for the AED and LF architectures are obtained immediately 
through the dynamical approach discussed in the previous Section 3.2. 

The stationary states of the AED network dynamics are shown in Fig. 5, for a 
typical activity of a = 0.8 and g ~ a. The pattern activity is chosen somewhat 
larger than a = 2/3 (uniform patterns) since for finite loading a = it is easy 
to find out that the quadrupolar state only exists for a > 0.698. In addition to the 
retrieval and quadrupolar phases, R{m 7^ 0, / / 0) and Q{m = 0, Z / 0), there is 
a self-sustained activity phase S{m = 0,1 = 0), also referred to as the zero phase 
Z II50I . 11091 . We remark that the saddle-points have one-dimensional basins of 
attraction with attractor directions along /, either towards /* 7^ or to = and 
repeller directions along m away from m = 0. Furthermore, at the boundary of the 
maximal storage capacity ac, both overlaps, m and /, disappear. 

A similar behaviour appears for other big values of the pattern activity a, 
whereas for small a there are only R and S phases. The reason for a low-T re- 
trieval phase and the absence of a Q phase is that a finite T is needed for the active 
neurons (±1) to coincide with the active patterns but with uncorrelated signs, such 
that m = 0. 

Again a lot of detailed results are available. The most important ones can be 
summarized as follows. Above the threshold (a, T) = (0.22, 0.45) a stable Q 
phase starts to appear. For T below that threshold m and I remain finite together, 
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Figure 5: The (T, a) phase diagram for the AED BEG network with pattern activity 
a ~ 0.8. Full (dotted) thin lines denote discontinuous (continuous) transitions, thick lines 
denote the boundary of the R phase. The lines at the most right yield the maximal storage 
capacity. The structure of the retrieval dynamics is explained: a denotes an attractor, s a 
saddle point, r a repellor 

in a behaviour characteristic for retrieval, up to the maximal ac- In this regime the 
fluctuation overlap does not yield anything essentially new that is not contained 
in the retrieval overlap. In contrast, above the threshold, m disappears first with 
increasing a leaving a finite / 7^ up to a bigger ac- Hence, first T and then a have 
to become large enough for the Q states to appear. Note that the fluctuation overlap 
carries a finite information even with m = in the Q phase. Thus, although the 
information transmitted by the network is mainly in the retrieval phase, there is 
also some information due to the Q phase. 

For small a, the fluctuation overlap "drives" a vanishingly small initial retrieval 
overlap, meaning almost no recognition of a given pattern by the network, into an 
asymptotic state with finite recognition. This is in contrast with the results for 
other three-state networks where first the overlap m{t) becomes non-zero: m{t) 
drives l{t). Furthermore, with a vanishing initial mo, the states of the network pass 
through the vicinity of a saddle point Q, with a finite fluctuation overlap / and still 
a vanishing retrieval overlap at small or intermediate times, giving some plateaus 
in q, I and the information content. It is only in passing beyond those plateaus, 
which may take a rather long time, that the states attain the asymptotic behaviour 
of the retrieval phase. 

In general, the basins of attraction for retrieval and the information content are 
larger in the BEG network than in other three-state networks. These results for the 
dynamics and the stationary states are confirmed by flow diagrams lISTl . 11091 . 

For the LF architecture some typical phase diagrams are shown in Fig. 6. We 
first remark that we need to introduce two further variables in the derivation of the 
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Figure 6: The {a, a), T = (left) and (T, a), a = 0.8 (right) phase diagram for the LF 
BEG network. There is a SG solution everywhere. There is a stable R phase (a second 
one with a smaller overlap in the shaded area) below the thick full line. Left: The Q state 
appears as a saddle-point below the thin full line. The thick dashed line shows the retrieval 
phase boundary for the optimal LF Q = 3-Ising model. Right: There is a stable (saddle- 
point) Q phase above (below) the thick (dotted) line, ending discontinuously at the full thin 
line. 

LF recurrence relations for the variances of the two noises i.e. 

= {{^^{t))i)^^^^ , Pi{t) = ((^'W>^)^^^^ • (66) 

The possible phases are then R{m > 0,1 > 0,qi > 0,qo > 0), Q{m = 0,1 > 
0, qi > 0,qo > 0) and SG{m = 0,1 = 0,qi > 0,qo > 0). From Fig. 6 we notice 
that a stable Q phase only appears for sufficiently large T and large a, a feature al- 
ready seen for the AED architecture. Thus, as T increases, the useful performance 
of the network goes over from the retrieval to the pattern-fluctuation retrieval phase. 
Furthermore, we see that for intermediate activity a € (0.435, 0.727) the LF BEG 
network has a bigger maximal storage capacity than the optimal LF Ising network 
|53l, optimal in the sense that the adjustable threshold parameter was chosen to 
optimize the storage capacity a. The same has been found for the information 
content. At larger activity, a = 0.8 say, the BEG and Ising networks compete for 
better performance at intermediate or larger a values. Moreover, as in the AED ar- 
chitecture, the flows to the stable solutions are considerably delayed by the saddle 
points in the form of slow transients of the dynamics. Finally, a remarkable feature 
is the presence of quite large basins of attraction either to the stable R state or to 
the stable Q state, even for the fairly high T (and small a). Also, not surprisingly, 
one finds a much smaller basin of attraction to the SG states. Similar features have 
also been found in the dynamics of the AED network except for the SG states, 
which are absent in that case. 
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For the symmetric architectures we restrict ourselves here to the FC one. Re- 
sults on the architecture with variable dilution can be found in 1^ . We apply 
directly the standard replica technique in order to calculate the free energy of the 
model. Within the replica-symmetry approximation and for a finite number, s, of 
condensed patterns, we obtain 



i ^ {aA ml + a{l - a)B l^) + ^ log(l - x) + ^ log(l 



ay a 



2(5 
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with the effective Hamiltonian H given by 
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In these expressions the relevant order parameters are 
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where (•)^ represents the thermal average with respect to H. As usual we take 
only one condensed pattern such that the index /i can be dropped. The parameters 
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gi and pi are the Edwards-Anderson order parameters with their conjugate vari- 
ables r respectively u. Finally, x and are the susceptibilities proportional to the 
fluctuation of the m overlap, respectively / overlap. All these parameters are the 
stationary limits of the corresponding parameters considered in the dynamics for 
arbitrary temperatures. We remark that the trace over the neurons and the average 
over the patterns can be performed explicitly. The resulting expressions are written 
down in i74i and have been solved numerically. 
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Figure 7: The BEG (a - T) phase diagram for a = 2/3 (left) and a = 0.8 (right). Dashed 
Unes correspond to continuous transitions, while full lines correspond to discontinuous 
transitions (in all order parameters). Below the line Tn retrieval states occur. The curve 
Tp represents the thermodynamic transition (shown as a thick line) between retrieval states 
and spin-glass states. The line Tsg denotes the transition from the spin-glass to the para- 
magnetic phase. Below the line Tq quadrupolar states exist and below the line Tp they are 
global minima. In the shaded region two retrieval states coexist. 

From fig. 7 (left) for uniform patterns and a comparison with the results for the 
FC Q = 3-Ising model (cfr. fig 2 right) one learns that Oc = 0.091 is almost double 
of the maximal capacity for the latter, Oc = 0.046, in the case of an optimal choice 
for the gain parameter b, i.e. b = 1/2. Compared with the Hopfield model, one 
sees that Qc is smaller in the BEG model, 0.091 versus 0.13, but apis larger, 0.053 
versus 0.051. So a bigger number of the retrieval states in the BEG network are 
global minima of the free energy. Finally, one also notices that the critical curves 
Tsg and Tr end in different temperature points at a = giving rise to a 'crossover' 
region for small a as it typically occurs in other multi-state models, e.g., the Potts 
model LI lOJ . [1111 and the Askin-Teller model |79|. This is related with the fact 
that for a = these models have a discontinuous transition at Tp. In this crossover 
region the retrieval states (global minima below Tp) and the paramagnetic states 
(local minima below Tp) coexist. 

As in the AED and LF architecture the quadrupolar phase is situated in the high 
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temperature region and we can understand the physics behind it in the following 
way. The spin-glass order parameter qi is zero, meaning that the ±1 spins are not 
frozen and as a consequence m can be zero. The fact that / is not zero practically 
means that the spins can flip freely between ±1 but the probability that they jump to 
or vice versa becomes very small. This effect arises from a > 1/2 onwards when 
the ratio between the second and the first term in the Hamiltonian starts increasing 
as (1 — a) It implies that the information content of the system is non-zero in 
this phase. 

Finally, we recall that for 7 = a(l - a)B = 1 (cfr. eq. S6T\i-S64\i). we recover 
the BEG neural network as studied above. However, it turns out that the maximum 
in the capacity is located at 7 = 0.712 with a corresponding value of Oc = 0.096. 
A reason for this is the approximation go ~ a made in order to get the mean- 
field Hamiltonian. The mutual information of the network is optimized under this 
assumption but, in general, it may not be completely realized in a specific model. 
Furthermore, the fact that replica-symmetry breaking may be bigger for larger a, 
as is also indicated by the zero-temperature entropy calculation, could be an extra 
reason for this. For more details we refer to the literature mentioned above. 



4 The Gardner capacity of multi-state models 

In the previous Section it has been found that the capacity and basin of attraction of 
the BEG network have been enlarged in comparison with those of other three-state 
networks. The models considered all have Hebbian-type learning rules. A natural 
question is then whether these improved retrieval quality aspects are restricted to 
the use of the Hebb rule or whether they are an intrinsic property of the BEG model. 
Therefore, we want to answer the following: given the set of p patterns specified 
above, is there a network (the best possible network of the BEG-type) which has 
these patterns as fixed points of the deterministic form of the dynamics d58l ? 

In order to do so we consider the perceptron architecture (N inputs with cou- 
plings Jj and Kj and 1 output) and we say that a given pattern, ^^^,1 = 1, . . . ,N, 
is stored if there exists a corresponding output 

C^=g{h^,en (75) 

with 

N ^ N 

h' = ^ E ^.e; e^ = ^ E K, i^f , (76) 

and {J, K} = { Jj, Kj} denoting the configurations in the space of interactions. 
The factor N^^/"^ is introduced to have the weights Jj and Kj of order unity (spher- 
ical constraint). 
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The aim is then to determine the maximal number of patterns, p, that can be 
stored in the perceptron, in other words to find the maximal value of the loading 
a = p/N for which couplings satisfying (I75t - (r76l can still be found. Following 
a Gardner-type analysis [21 1 the fundamental quantity that we want to calculate is 
then the volume fraction of weight space given by 

V = f dJdKpiJ, K) f[ X5M(/^^ r; k) (77) 
with the characteristic function 

+{i-{^i;f)e{-m-e^-K) m 

where k is the imbedding stability parameter measuring the size of the basin of 
attraction for the fi-th pattern and yo(J,K) is the following normalization factor 
assuming spherical constraints for the couplings 

0(3 K) = HJ-^-N)S{K.K-N) 

JZodJdKS{J-3-N)S{K-K-N)- ^ ' 

In order to perform the average over the disorder in the input patterns and the 
corresponding output we employ the replica technique to evaluate the entropy per 
site 

v= \im ]-A{lrvV)) (80) 

N^oo iV 

where ((• • •)) denotes an average over the statistics of inputs {i'^} and outputs 
{i^}, recalhng (|52j. 

In the replica approach the entropy per site v is computed via the expression 

v= lim lim — (((F")) - 1) = lim lim — In ((F")) (81) 

where is the n-times replicated fractional volume 

/n 
[ II dJ"(iK"(5(j" • r - iv)5(K" • - iV 

x((n flxci;ih%0"^;^)^'^ (82) 
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whereby we can forget, since the couphngs are continuous, about constant terms 
such as the denominator in dT^ . The replica-symmetric calculation then proceeds 
in a standard way, although the technical details are much more complicated, and 
an analytic formula can be obtained |T7|. 

Comparing with analogous discussions in the literature for other three-state 
neuron perceptron models we recall that for k = and uniform patterns the Q = 3 
Ising perceptron can maximally reach an optimal capacity equal to 1.5, depending 
on the separation between the plateaus of the gain function (see fJ5\, fJ6\ for the 
precise details) and the Q = 3 clock and Potts model both reach an optimal capac- 
ity of 2.40 llml . ni3l while the value for the BEG perceptron found here is 2.24. 
Here we have to recall that the Q = 3 Ising perceptron and the BEG perceptron 
have the same topology structure in the neurons, whereas the Q = 3 clock and 
Potts models have a different topology, as explained in the Introduction. Since, in 
general, perceptrons turn out to be very useful models in connection with learning 
and generalization this is an interesting observation. 

The stability of the replica-symmetric solution has been studied by generalizing 
the de Almeida-Thouless analysis and deriving an analytic expression for the two 
replicon eigenvalues that play a role in the Gardner limit. Breaking only occurs for 
small activities and very small imbedding constants, k < 0.0061. This is consistent 
with the stability results found for the Q = 3 Ising perceptrons. 

These results strenghten the idea that the better retrieval properties found for 
the BEG model in comparison with the Q = 3 Ising model are not restricted to the 
specific Hebb rule but are intrinsic to the model. 

5 Concluding remarks 

In this overview we have studied the dynamics and retrieval properties of multi- 
state neural networks based upon spin-glass models. In particular, we have first 
discussed the Q-Ising model and the Blume-Emery-Griffiths model with various 
architectures and Hebbian-type learning rules. The methods used are the signal- 
to-noise analysis and the thermodynamic mean-field replica technique. Then, the 
Gardner optimal capacity for these models has been considered. 

A number of detailed results have been outlined in order to compar e the proper- 
ties of the different networks and architectures. The Blume-Emery-Griffiths model, 
obtained by maximizing the mutual information content of networks with scalar 
valued three-state neurons, shows improved retrieval properties in comparison with 
the Q = 3-Ising model. 
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